2.3 Repeated Holdout Validation

One way to obtain a more robust performance estimate that is less variant to how we split the data into training and test sets is to repeat the holdout method k times with different random seeds and compute the average performance over these k repetitions:

「データを訓練とテストセットに分ける方法に対してより変わりやすくない、より頑健な汎化性能の見積りを獲得するための1つの方法は」

「異なるランダムシードでホールドアウト法をk回繰り返し、k回の間の平均の汎化性能を計算すること」

Monte Carlo Cross-Validationとも呼ばれる

通常のholdout validation methodと比較して、ランダムなテストセットに対する性能をよりよく見積もれる

モデルの不変性(stability)に対する情報も得られる

how the model, produced by a learning algorithm, changes with different training set splits.

「ある学習アルゴリズムによって与えられるモデルが、異なる訓練セットの分割でどう変わるか」

Figure 6: Iris datasetに対するkNN(k=3)でrepeated holdout validation（50回）を実施したときのaccuracy

train/testを50/50に分けたときと90/10に分けたとき（分割は層化されている）

平均のaccuracyは50/50に分けたときが95%、90/10に分けたときが96%

two of the points that were previously discussed

1. 「テストセットのサイズが小さくなるにつれて（汎化性能の）見積りのvarianceは増加する」

the variance of our estimate increases as the size of the test set decreases.

90/10の分割のほうがaccuracyのばらつきが大きいことから

2. 「訓練セットのサイズを減らすとき、悲観的なバイアスが小さく増加している」

we see a small increase in the pessimistic bias when we decrease the size of the training set

50/50の分割のほうが平均accuracyが小さいことから

訓練データが50%と少なかったために平均の性能はわずかに低い